lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Testing characteristics of samples.md (2889B)


      1 +++
      2 title = 'Testing characteristics of samples (goodness-of-fit, independence, homogeneity)'
      3 template = 'page-math.html'
      4 +++
      5 
      6 # Testing characteristics of samples (goodness-of-fit, independence, homogeneity)
      7 
      8 ## Goodness-of-fit
      9 
     10 Checks if observed freq. distribution fits a claimed distribution.
     11 Sample size n with k different categories.
     12 
     13 Hypotheses:
     14 - $H_{0}$: frequency counts agree with claimed distribution
     15 - $H_{A}$: frequency counts do not agree with the claimed distribution
     16 
     17 $O_{i}$ is observed frequency count of category *i*. $E_{i} = n \times p_{i}$ is the expected frequency count.
     18 
     19 Test statistic is:
     20 $\chi^{2} = \sum_{i=1}k\frac{(O_{i} - E_{i})^{2}}{E_{i}}$
     21 
     22 and has approximately a chi-square distribution with k − 1 degrees of freedom under the null hypothesis.
     23 
     24 Critical value:
     25 
     26 - reject null hypothesis if $\chi_{2} > \chi^{2}_{k-1, \alpha}$
     27 - P value: reject null hypothesis if $P(\chi^{2} \geq x^{2}) < \alpha$
     28 
     29 test is right-tailed since we need large values of test statistic (even if hypothesis is undirected).
     30 
     31 ## Test of independence
     32 
     33 When: two variables in a *single sample*
     34 
     35 you have a contingency table with r row categories and c column categories. checking to see if columns and variables are dependent.
     36 
     37 H0: row and column variables are independent
     38 HA: row and column variables are dependent
     39 
     40 test statistic:
     41 
     42 $\chi^2 = \sum_{cells} \frac{(O-E)^{2}}{E}$
     43 
     44 has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom.
     45 
     46 reject null hypothesis if $\chi^{2} > \chi^{2}_{(r-1)(e-1), \alpha}$
     47 
     48 ## Test of homogeneity
     49 
     50 When: comparing two or more samples to see if they have the same proportions of characteristics.
     51 
     52 r different populations (rows) and c different categories (columns) of some variable checking for proportions of a characteristic in the populations.
     53 
     54 H0: different populations have same proportions of some characteristics
     55 
     56 HA: different populations don’t have the same proportions of some characteristics.
     57 
     58 test statistic:
     59 
     60 $\chi^{2} = \sum_{cells} \frac{(O-E)^2}{E}$
     61 
     62 has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom.
     63 
     64 reject H0 if observed $\chi^{2} > \chi^{2}_{(r-1)(e-1),\alpha}$
     65 
     66 ## Fisher’s exact test for 2-by-2 contingency table
     67 
     68 either:
     69 
     70 - H0: row and column variables are independent
     71 - HA: occurrence of “first column category” is more common in group of “first row category” than in group of “second row category”
     72 
     73 or:
     74 
     75 - H0: populations have same proportion of one characteristic
     76 - HA: the proportion of the characteristic is bigger/smaller in one population
     77 
     78 test statistic: frequency count in cell (1,1) has under H0 and given marginals a hypergeometric distribution
     79 
     80 parameters n = (first row total), N = (grand total), and k = (first column total)
     81 
     82 guess we don’t need to know how to do this manually.